智能论文笔记

Instance Semantic Segmentation Benefits from Generative Adversarial Networks

Quang H. Le , Kamal Youcef-Toumi , Dzmitry Tsetserukou , Ali Jahanian

分类：计算机视觉

2020-10-26

在重建掩码的实例分段网络的设计中，分段通常是其文字定义 - 分配每个像素标签。这导致了将问题视为匹配一个问题，其中一个目标是最小化重建和地面真相像素之间的损耗。重新思考重建网络作为发电机，我们定义了预测掩模作为GAN游戏框架的问题：分割网络生成掩码，鉴别器网络决定掩码的质量。为了演示这个游戏，我们对掩模R-CNN的普通分段框架显示了有效修改。我们发现，在特征空间中播放游戏比导致鉴别器和发电机之间的稳定训练的像素空间更有效，应该通过预测对象的上下文区域来替换预测对象坐标，并且整体对抗性损失有助于性能和消除每个不同数据域的任何自定义设置都需要。我们在各个域中测试我们的框架并报告手机回收，自动驾驶，大规模对象检测和医用腺体。我们观察到一般的GANS产生掩模，该掩模占克里克里德界，杂乱，小物体和细节，处于规则形状或异质和聚结形状的领域。我们的再现结果的代码可公开提供。

translated by 谷歌翻译

Continual Learning: Fast and Slow

Quang Pham , Chenghao Liu , Steven C. H. Hoi

分类：人工智能 | 计算机视觉 | 机器学习

2022-09-06

根据互补学习系统（CLS）理论〜\ cite {mcclelland1995there}在神经科学中，人类通过两个补充系统有效\ emph {持续学习}：一种快速学习系统，以海马为中心，用于海马，以快速学习细节，个人体验，个人体验，个人体验，个人体验，个人体验，个人体验，个人体验，个人体验的快速学习， ;以及位于新皮层中的缓慢学习系统，以逐步获取有关环境的结构化知识。在该理论的激励下，我们提出\ emph {dualnets}（对于双网络），这是一个一般的持续学习框架，该框架包括一个快速学习系统，用于监督从特定任务和慢速学习系统中的模式分离代表学习，用于表示任务的慢学习系统 - 不可知论的一般代表通过自我监督学习（SSL）。双网符可以无缝地将两种表示类型纳入整体框架中，以促进在深层神经网络中更好地持续学习。通过广泛的实验，我们在各种持续的学习协议上展示了双网络的有希望的结果，从标准离线，任务感知设置到具有挑战性的在线，无任务的场景。值得注意的是，在Ctrl〜 \ Cite {veniat2020202020202020202020202020202020202020202020202020202020202021- coite {ostapenko2021-continual}的基准中。此外，我们进行了全面的消融研究，以验证双nets功效，鲁棒性和可伸缩性。代码可在\ url {https://github.com/phquang/dualnet}上公开获得。

translated by 谷歌翻译

On Unbalanced Optimal Transport: Gradient Methods, Sparsity and Approximation Error

Quang Minh Nguyen , Hoang H. Nguyen , Yi Zhou , Lam M. Nguyen

分类：机器学习

2022-02-08

我们研究了两种可能不同质量的度量之间的不平衡最佳运输（UOT），其中最多是$ n $组件，其中标准最佳运输（OT）的边际约束是通过kullback-leibler差异与正则化因子$ \ tau $放松的。尽管仅在文献中分析了具有复杂性$ o \ big（\ tfrac {\ tau n^2 \ log（n）} {\ varepsilon} \ log \ big（\ tfrac {\ log（ n）} {{{\ varepsilon}} \ big）\ big）$）$用于实现错误$ \ varepsilon $，它们与某些深度学习模型和密集的输出运输计划不兼容，强烈阻碍了实用性。虽然被广泛用作计算现代深度学习应用中UOT的启发式方法，并且在稀疏的OT中表现出成功，但尚未正式研究用于UOT的梯度方法。为了填补这一空白，我们提出了一种基于梯度外推法（Gem-uot）的新颖算法，以找到$ \ varepsilon $ -Approximate解决方案，以解决$ o \ big中的UOT问题（\ kappa n^2 \ log \ log \ big（big） \ frac {\ tau n} {\ varepsilon} \ big）\ big）$，其中$ \ kappa $是条件号，具体取决于两个输入度量。我们的算法是通过优化平方$ \ ell_2 $ -norm UOT目标的新的双重配方设计的，从而填补了缺乏稀疏的UOT文献。最后，我们在运输计划和运输距离方面建立了UOT和OT之间近似误差的新颖表征。该结果阐明了一个新的主要瓶颈，该瓶颈被强大的OT文献忽略了：尽管OT放松了OT，因为UOT承认对离群值的稳健性，但计算出的UOT距离远离原始OT距离。我们通过基于Gem-uot从UOT中检索的原则方法来解决此类限制，并使用微调的$ \ tau $和后进程投影步骤来解决。关于合成和真实数据集的实验验证了我们的理论，并证明了我们的方法的良好性能。

translated by 谷歌翻译

DRG-Net: Interactive Joint Learning of Multi-lesion Segmentation and Classification for Diabetic Retinopathy Grading

Hasan Md Tusfiqur , Duy M. H. Nguyen , Mai T. N. Truong , Triet A. Nguyen , Binh T. Nguyen , Michael Barz , Hans-Juergen Profitlich , Ngoc T. T. Than , Ngan Le , Pengtao Xie

分类：计算机视觉

2022-12-30

Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.

translated by 谷歌翻译

Federated PCA on Grassmann Manifold for Anomaly Detection in IoT Networks

Tung-Anh Nguyen , Jiayu He , Long Tan Le , Wei Bao , Nguyen H. Tran

分类：机器学习

2022-12-23

In the era of Internet of Things (IoT), network-wide anomaly detection is a crucial part of monitoring IoT networks due to the inherent security vulnerabilities of most IoT devices. Principal Components Analysis (PCA) has been proposed to separate network traffics into two disjoint subspaces corresponding to normal and malicious behaviors for anomaly detection. However, the privacy concerns and limitations of devices' computing resources compromise the practical effectiveness of PCA. We propose a federated PCA-based Grassmannian optimization framework that coordinates IoT devices to aggregate a joint profile of normal network behaviors for anomaly detection. First, we introduce a privacy-preserving federated PCA framework to simultaneously capture the profile of various IoT devices' traffic. Then, we investigate the alternating direction method of multipliers gradient-based learning on the Grassmann manifold to guarantee fast training and the absence of detecting latency using limited computational resources. Empirical results on the NSL-KDD dataset demonstrate that our method outperforms baseline approaches. Finally, we show that the Grassmann manifold algorithm is highly adapted for IoT anomaly detection, which permits drastically reducing the analysis time of the system. To the best of our knowledge, this is the first federated PCA algorithm for anomaly detection meeting the requirements of IoT networks.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Scaling Instruction-Finetuned Language Models

Hyung Won Chung , Le Hou , Shayne Longpre , Barret Zoph , Yi Tay , William Fedus , Yunxuan Li , Xuezhi Wang , Mostafa Dehghani , Siddhartha Brahma

分类：机器学习 | 自然语言处理

2022-10-20

Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation). For instance, Flan-PaLM 540B instruction-finetuned on 1.8K tasks outperforms PALM 540B by a large margin (+9.4% on average). Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints, which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.

translated by 谷歌翻译

Uncertainty-aware Label Distribution Learning for Facial Expression Recognition

Nhat Le , Khanh Nguyen , Quang Tran , Erman Tjiputra , Bac Le , Anh Nguyen

分类：计算机视觉

2022-09-21

尽管在过去的几年中取得了重大进展，但歧义仍然是面部表情识别（FER）的关键挑战。它可能导致嘈杂和不一致的注释，这阻碍了现实世界中深度学习模型的性能。在本文中，我们提出了一种新的不确定性标签分布学习方法，以提高深层模型的鲁棒性，以防止不确定性和歧义。我们利用价值空间中的邻里信息来适应培训训练样本的情绪分布。我们还考虑提供的标签将其纳入标签分布时的不确定性。我们的方法可以轻松地集成到深层网络中，以获得更多的培训监督并提高识别准确性。在各种嘈杂和模棱两可的环境下，在几个数据集上进行了密集的实验表明，我们的方法取得了竞争成果，并且超出了最新的最新方法。我们的代码和模型可在https://github.com/minhnhatvt/label-distribution-learning-fer-tf上找到。

translated by 谷歌翻译

LAVIS: A Library for Language-Vision Intelligence

Dongxu Li , Junnan Li , Hung Le , Guangsen Wang , Silvio Savarese , Steven C. H. Hoi

分类：计算机视觉 | 自然语言处理 | 机器学习

2022-09-15

我们介绍了Lavis，这是一个开源深度学习库，用于语言视觉研究和应用。拉维斯（Lavis）的目标是作为一个一站式综合图书馆，它为研究人员和从业人员提供了可访问语言视觉领域的最新进步，并赋予未来的研究和发展。它具有统一的界面，可轻松访问最新的图像语言，视频语言模型和常见数据集。 Lavis支持对各种任务的培训，评估和基准测试，包括多模式分类，检索，字幕，视觉问题答案，对话和预训练。同时，该库还高度可扩展且可配置，从而促进了未来的开发和定制。在此技术报告中，我们描述了图书馆的设计原理，关键组成部分和功能，并在常见的语言视觉任务中提出基准测试结果。该库可在以下网址获得：https：//github.com/salesforce/lavis。

translated by 谷歌翻译

Learning ASR pathways: A sparse multilingual ASR model

Mu Yang , Andros Tjandra , Chunxi Liu , David Zhang , Duc Le , John H. L. Hansen , Ozlem Kalinli

分类：自然语言处理

2022-09-13

神经网络修剪可以有效地用于压缩自动语音识别（ASR）模型。但是，在多语言ASR中，执行语言不足的修剪可能会导致某些语言的严重性能降解，因为语言 - 敏捷的修剪口罩可能不符合所有语言，并丢弃了重要的语言特定参数。在这项工作中，我们提出了ASR路径，这是一种稀疏的多语言ASR模型，该模型激活了特定语言的子网络（“路径”），从而明确地学习了每种语言的参数。通过重叠的子网络，共享参数还可以通过联合多语言培训来实现较低资源语言的知识传输。我们提出了一种新型算法来学习ASR途径，并通过流式RNN-T模型评估了4种语言的建议方法。我们提出的ASR途径的表现都优于密集模型（平均-5.0％）和语言不足的修剪模型（平均-21.4％），并且与单语稀疏模型相比，低资源语言的性能更好。

translated by 谷歌翻译